Overview

Dataset statistics

Number of variables10
Number of observations824
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory64.5 KiB
Average record size in memory80.2 B

Variable types

NUM10

Reproduction

Analysis started2022-11-13 16:59:56.017754
Analysis finished2022-11-13 17:00:22.238046
Duration26.22 seconds
Versionpandas-profiling v2.7.1
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
Id has unique values Unique
slag has 377 (45.8%) zeros Zeros
flyash has 461 (55.9%) zeros Zeros
superplasticizer has 304 (36.9%) zeros Zeros

Variables

Id
Real number (ℝ≥0)

UNIQUE
Distinct count824
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean513.8470873786408
Minimum0
Maximum1028
Zeros1
Zeros (%)0.1%
Memory size6.6 KiB

Quantile statistics

Minimum0
5-th percentile50.15
Q1251.75
median513.5
Q3770.25
95-th percentile974.85
Maximum1028
Range1028
Interquartile range (IQR)518.5

Descriptive statistics

Standard deviation296.7867789
Coefficient of variation (CV)0.5775780115
Kurtosis-1.20657835
Mean513.8470874
Median Absolute Deviation (MAD)259.5
Skewness0.000627465727
Sum423410
Variance88082.39214
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
995 1 0.1%
 
202 1 0.1%
 
608 1 0.1%
 
143 1 0.1%
 
1000 1 0.1%
 
197 1 0.1%
 
609 1 0.1%
 
279 1 0.1%
 
293 1 0.1%
 
400 1 0.1%
 
Other values (814) 814 98.8%
 
ValueCountFrequency (%) 
0 1 0.1%
 
1 1 0.1%
 
4 1 0.1%
 
5 1 0.1%
 
6 1 0.1%
 
ValueCountFrequency (%) 
1028 1 0.1%
 
1027 1 0.1%
 
1026 1 0.1%
 
1024 1 0.1%
 
1023 1 0.1%
 

cement
Real number (ℝ≥0)

Distinct count254
Unique (%)30.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean283.36080097087375
Minimum102.0
Maximum540.0
Zeros0
Zeros (%)0.0%
Memory size6.6 KiB

Quantile statistics

Minimum102
5-th percentile143.615
Q1192
median275.1
Q3359.9
95-th percentile491
Maximum540
Range438
Interquartile range (IQR)167.9

Descriptive statistics

Standard deviation107.5364039
Coefficient of variation (CV)0.3795034581
Kurtosis-0.6077758768
Mean283.360801
Median Absolute Deviation (MAD)83.9
Skewness0.4933427896
Sum233489.3
Variance11564.07816
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
425 17 2.1%
 
362.6 16 1.9%
 
251.4 13 1.6%
 
475 13 1.6%
 
310 13 1.6%
 
349 12 1.5%
 
250 12 1.5%
 
446 11 1.3%
 
236 10 1.2%
 
331 10 1.2%
 
Other values (244) 697 84.6%
 
ValueCountFrequency (%) 
102 4 0.5%
 
108.3 4 0.5%
 
116 3 0.4%
 
122.6 4 0.5%
 
132 2 0.2%
 
ValueCountFrequency (%) 
540 7 0.8%
 
531.3 5 0.6%
 
528 1 0.1%
 
525 7 0.8%
 
522 2 0.2%
 

slag
Real number (ℝ≥0)

ZEROS
Distinct count166
Unique (%)20.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean74.37160194174757
Minimum0.0
Maximum359.4
Zeros377
Zeros (%)45.8%
Memory size6.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median22
Q3144.775
95-th percentile236
Maximum359.4
Range359.4
Interquartile range (IQR)144.775

Descriptive statistics

Standard deviation86.97778445
Coefficient of variation (CV)1.169502635
Kurtosis-0.5182968517
Mean74.37160194
Median Absolute Deviation (MAD)22
Skewness0.8020652798
Sum61282.2
Variance7565.134988
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 377 45.8%
 
189 24 2.9%
 
106.3 17 2.1%
 
24 11 1.3%
 
20 9 1.1%
 
98.1 9 1.1%
 
19 8 1.0%
 
145 8 1.0%
 
26 7 0.8%
 
203.5 6 0.7%
 
Other values (156) 348 42.2%
 
ValueCountFrequency (%) 
0 377 45.8%
 
11 4 0.5%
 
13.6 2 0.2%
 
15 5 0.6%
 
17.2 1 0.1%
 
ValueCountFrequency (%) 
359.4 2 0.2%
 
342.1 1 0.1%
 
316.1 2 0.2%
 
305.3 3 0.4%
 
290.2 2 0.2%
 

flyash
Real number (ℝ≥0)

ZEROS
Distinct count130
Unique (%)15.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean53.16080097087379
Minimum0.0
Maximum195.0
Zeros461
Zeros (%)55.9%
Memory size6.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q3118.3
95-th percentile166.85
Maximum195
Range195
Interquartile range (IQR)118.3

Descriptive statistics

Standard deviation64.0006463
Coefficient of variation (CV)1.203906734
Kurtosis-1.321913729
Mean53.16080097
Median Absolute Deviation (MAD)0
Skewness0.5660377685
Sum43804.5
Variance4096.082726
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 461 55.9%
 
118.3 15 1.8%
 
141 14 1.7%
 
24.5 13 1.6%
 
79 11 1.3%
 
95.7 9 1.1%
 
121.6 9 1.1%
 
94 9 1.1%
 
100.4 8 1.0%
 
167 8 1.0%
 
Other values (120) 267 32.4%
 
ValueCountFrequency (%) 
0 461 55.9%
 
24.5 13 1.6%
 
59 1 0.1%
 
71 1 0.1%
 
71.5 1 0.1%
 
ValueCountFrequency (%) 
195 3 0.4%
 
194.9 1 0.1%
 
194 1 0.1%
 
190 1 0.1%
 
185.3 1 0.1%
 

water
Real number (ℝ≥0)

Distinct count179
Unique (%)21.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean181.79708737864075
Minimum121.8
Maximum247.0
Zeros0
Zeros (%)0.0%
Memory size6.6 KiB

Quantile statistics

Minimum121.8
5-th percentile146.13
Q1164.9
median185.35
Q3192
95-th percentile228
Maximum247
Range125.2
Interquartile range (IQR)27.1

Descriptive statistics

Standard deviation21.32190452
Coefficient of variation (CV)0.1172840821
Kurtosis0.1765762128
Mean181.7970874
Median Absolute Deviation (MAD)13
Skewness0.09197296622
Sum149800.8
Variance454.6236124
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
192 97 11.8%
 
228 43 5.2%
 
185.7 36 4.4%
 
203.5 30 3.6%
 
186 25 3.0%
 
162 17 2.1%
 
164.9 16 1.9%
 
153.5 13 1.6%
 
200 12 1.5%
 
193 11 1.3%
 
Other values (169) 524 63.6%
 
ValueCountFrequency (%) 
121.8 5 0.6%
 
126.6 4 0.5%
 
137.8 3 0.4%
 
140 1 0.1%
 
140.8 4 0.5%
 
ValueCountFrequency (%) 
247 1 0.1%
 
246.9 1 0.1%
 
237 1 0.1%
 
236.7 1 0.1%
 
228 43 5.2%
 

superplasticizer
Real number (ℝ≥0)

ZEROS
Distinct count105
Unique (%)12.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.1639563106796125
Minimum0.0
Maximum32.2
Zeros304
Zeros (%)36.9%
Memory size6.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median6.1
Q310.125
95-th percentile16.085
Maximum32.2
Range32.2
Interquartile range (IQR)10.125

Descriptive statistics

Standard deviation5.967257716
Coefficient of variation (CV)0.9680889051
Kurtosis1.265712251
Mean6.163956311
Median Absolute Deviation (MAD)5.6
Skewness0.8977497366
Sum5079.1
Variance35.60816464
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 304 36.9%
 
11.6 28 3.4%
 
8 17 2.1%
 
7 14 1.7%
 
9.9 13 1.6%
 
7.8 13 1.6%
 
9 13 1.6%
 
16.5 13 1.6%
 
10 12 1.5%
 
6 12 1.5%
 
Other values (95) 385 46.7%
 
ValueCountFrequency (%) 
0 304 36.9%
 
1.7 4 0.5%
 
1.9 1 0.1%
 
2 1 0.1%
 
2.5 2 0.2%
 
ValueCountFrequency (%) 
32.2 3 0.4%
 
28.2 5 0.6%
 
23.4 4 0.5%
 
22.1 1 0.1%
 
22 5 0.6%
 

coarseaggregate
Real number (ℝ≥0)

Distinct count258
Unique (%)31.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean973.5485436893204
Minimum801.0
Maximum1145.0
Zeros0
Zeros (%)0.0%
Memory size6.6 KiB

Quantile statistics

Minimum801
5-th percentile842
Q1932
median968
Q31040.6
95-th percentile1104.51
Maximum1145
Range344
Interquartile range (IQR)108.6

Descriptive statistics

Standard deviation78.69463012
Coefficient of variation (CV)0.08083277473
Kurtosis-0.6438676258
Mean973.5485437
Median Absolute Deviation (MAD)52.5
Skewness-0.04148492161
Sum802204
Variance6192.84481
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
932 46 5.6%
 
852.1 39 4.7%
 
944.7 24 2.9%
 
1125 21 2.5%
 
968 20 2.4%
 
967 16 1.9%
 
1047 15 1.8%
 
942 10 1.2%
 
822 9 1.1%
 
938 9 1.1%
 
Other values (248) 615 74.6%
 
ValueCountFrequency (%) 
801 4 0.5%
 
801.4 1 0.1%
 
811 1 0.1%
 
814 1 0.1%
 
814.1 1 0.1%
 
ValueCountFrequency (%) 
1145 1 0.1%
 
1134.3 5 0.6%
 
1130 1 0.1%
 
1125 21 2.5%
 
1124.4 2 0.2%
 

fineaggregate
Real number (ℝ≥0)

Distinct count274
Unique (%)33.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean772.1074029126214
Minimum594.0
Maximum992.6
Zeros0
Zeros (%)0.0%
Memory size6.6 KiB

Quantile statistics

Minimum594
5-th percentile613
Q1726.775
median778.5
Q3821.25
95-th percentile895.895
Maximum992.6
Range398.6
Interquartile range (IQR)94.475

Descriptive statistics

Standard deviation80.98471665
Coefficient of variation (CV)0.1048878904
Kurtosis-0.1344581102
Mean772.1074029
Median Absolute Deviation (MAD)45.5
Skewness-0.2399879142
Sum636216.5
Variance6558.524332
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
755.8 24 2.9%
 
594 24 2.9%
 
613 20 2.4%
 
670 18 2.2%
 
801 14 1.7%
 
746.6 13 1.6%
 
887.1 13 1.6%
 
712 11 1.3%
 
845 10 1.2%
 
780.1 10 1.2%
 
Other values (264) 667 80.9%
 
ValueCountFrequency (%) 
594 24 2.9%
 
605 5 0.6%
 
611.8 5 0.6%
 
612 1 0.1%
 
613 20 2.4%
 
ValueCountFrequency (%) 
992.6 4 0.5%
 
945 2 0.2%
 
943.1 4 0.5%
 
942 4 0.5%
 
925.7 4 0.5%
 

age
Real number (ℝ≥0)

Distinct count14
Unique (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean44.661407766990294
Minimum1
Maximum365
Zeros0
Zeros (%)0.0%
Memory size6.6 KiB

Quantile statistics

Minimum1
5-th percentile3
Q112.25
median28
Q356
95-th percentile180
Maximum365
Range364
Interquartile range (IQR)43.75

Descriptive statistics

Standard deviation60.47570164
Coefficient of variation (CV)1.354093045
Kurtosis13.07549467
Mean44.66140777
Median Absolute Deviation (MAD)21
Skewness3.33541121
Sum36801
Variance3657.310489
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
28 350 42.5%
 
3 110 13.3%
 
7 94 11.4%
 
56 72 8.7%
 
14 46 5.6%
 
90 44 5.3%
 
100 39 4.7%
 
180 21 2.5%
 
91 20 2.4%
 
270 9 1.1%
 
Other values (4) 19 2.3%
 
ValueCountFrequency (%) 
1 2 0.2%
 
3 110 13.3%
 
7 94 11.4%
 
14 46 5.6%
 
28 350 42.5%
 
ValueCountFrequency (%) 
365 9 1.1%
 
360 5 0.6%
 
270 9 1.1%
 
180 21 2.5%
 
120 3 0.4%
 

csMPa
Real number (ℝ≥0)

Distinct count701
Unique (%)85.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.8578640776699
Minimum2.33
Maximum82.6
Zeros0
Zeros (%)0.0%
Memory size6.6 KiB

Quantile statistics

Minimum2.33
5-th percentile11.3645
Q123.685
median34.08
Q345.8625
95-th percentile67.28
Maximum82.6
Range80.27
Interquartile range (IQR)22.1775

Descriptive statistics

Standard deviation16.86509934
Coefficient of variation (CV)0.4703319557
Kurtosis-0.2738606121
Mean35.85786408
Median Absolute Deviation (MAD)10.81
Skewness0.4619332841
Sum29546.88
Variance284.4315757
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
33.4 4 0.5%
 
23.52 4 0.5%
 
77.3 4 0.5%
 
71.3 4 0.5%
 
79.3 4 0.5%
 
31.35 3 0.4%
 
17.54 3 0.4%
 
28.63 3 0.4%
 
39.3 3 0.4%
 
64.3 3 0.4%
 
Other values (691) 789 95.8%
 
ValueCountFrequency (%) 
2.33 1 0.1%
 
3.32 1 0.1%
 
4.57 1 0.1%
 
4.78 1 0.1%
 
4.9 1 0.1%
 
ValueCountFrequency (%) 
82.6 1 0.1%
 
81.75 1 0.1%
 
80.2 1 0.1%
 
79.99 1 0.1%
 
79.4 1 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

IdcementslagflyashwatersuperplasticizercoarseaggregatefineaggregateagecsMPa
0995158.6148.9116.0175.115.0953.3719.72827.68
1507424.022.0132.0178.08.5822.0750.02862.05
2334275.10.0121.4159.59.91053.6777.5323.80
3848252.097.076.0194.08.0835.0821.02833.40
4294168.942.2124.3158.310.81080.8796.237.40
5286181.40.0167.0169.67.61055.6777.82827.77
6938154.8183.40.0193.39.11047.4696.72818.29
7447178.0129.8118.6179.93.61007.3746.85648.59
8692212.0141.30.0203.50.0973.4750.09039.70
9652102.0153.00.0192.00.0887.0942.034.57

Last rows

IdcementslagflyashwatersuperplasticizercoarseaggregatefineaggregateagecsMPa
814308277.10.097.4160.611.8973.9875.610055.64
815661141.3212.00.0203.50.0971.8748.5710.39
816130323.7282.80.0183.810.3942.7659.92874.70
817663133.0200.00.0192.00.0927.4839.22827.87
818871159.0187.00.0176.011.0990.0789.02832.76
81987286.3200.90.0144.711.21004.6803.7324.40
820330246.80.0125.1143.312.01086.8800.91442.22
821466190.30.0125.2166.69.91079.0798.910033.56
822121475.0118.80.0181.18.9852.1781.52868.30
823860314.00.0113.0170.010.0925.0783.02838.46